Title: Integrative multi-omics reveal glial signatures associated with accelerated cognitive decline in Alzheimer’s disease
Overview:
Data Integration: Using Omix, transcriptomics and proteomics data are integrated to identify modules representing distinct biological mechanisms in a discovery cohort.
Projection on ROSMAP Data: These modules are projected onto publicly available early AD (Braak III-IV) data from the ROSMAP cohort.
Molecular Subgroup Identification: Multi-omics clustering is employed to identify molecular subgroups in early AD.
Progression Analysis: The subgroups are analysed for differences in disease progression using Kaplan-Meier survival analysis.
Disease mechanisms activity assessment: The activity of biological mechanisms (module eigenvalues) identified in the discovery cohort is compared between the molecular subgroups in the ROSMAP cohort.
Summary of methods:
Data Pre-processing: Quality control, normalisation, and denoising of RNA and proteomics data.
Integration Models: MOFA for bulk transcriptomics-proteomics integration and iCluster for molecular subtyping.
Downstream analyses:
Community detection within multi-omics transcriptomics-proteomics co-expression networks.
Transcriptomics-proteomics module eigenvalue computation and correlation with neuropathological features.
Cell type and pathway enrichment analyses.
Impact:
This analysis offers a deepened understanding of the biological mechanisms underlying glial activation and cognitive decline in AD, facilitating the identification of novel biomarkers and potential therapeutic targets. The pipeline also establishes a scalable framework for integrating multi-omics data in other neurodegenerative conditions.
Some of these analyses are available in my pre-print: medRxiv 2024.08.27.24312641; doi: https://doi.org/10.1101/2024.08.27.24312641
Omix package:
The goal of Omix is to provide tools in R to build a complete analysis workflow for integrative analysis of data generated from multi-omics platform.
Generate a multi-omics object using
MultiAssayExperiment.
Quality control of single-omics data.
Formatting, normalisation, denoising of single-omics data.
Separate single-omics analyses.
Integration of multi-omics data for combined analysis.
Publication quality plots and interactive analysis reports based of shinyApp.
Currently, Omix supports the integration of bulk transcriptomics and bulk proteomics.
The Omix pipeline requires the following input:
The following datasets are required to run the Omix pipeline:
| Dataset | Description | Format |
|---|---|---|
rawdata_rna |
RNA raw counts, genes as rows, samples as columns | CSV |
rawdata_protein |
Protein abundances, proteins as rows, samples as columns | CSV |
map_rna |
Mapping of RNA sample IDs to metadata | CSV |
map_protein |
Mapping of protein sample IDs to metadata | CSV |
metadata_rna |
RNA-specific metadata | CSV |
metadata_protein |
Protein-specific metadata | CSV |
individual_metadata |
Individual-level metadata for both assays | CSV |
Call required packages for workflow:
library(Omix)
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(DT)
library(survival)
library(survminer)
First, we must load the data from Omix for the workflow:
outputDir <- tempdir()
ctd_fp <-file.path(outputDir, "ctd.rds")
ensembl_fp <-file.path(outputDir, "ensembl.rds")
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ctd-2.rds",destfile=ctd_fp)
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ensembl_mappings_human.tsv",destfile=ensembl_fp)
ctd <- readRDS(paste0(outputDir,'/ctd.rds'))
ensembl <-read.delim(file=paste0(outputDir,'/ensembl.rds'), sep = '\t', header = TRUE)
rna_qc_data_matrix <- NULL
Download omics data for analysis
outputDir <- tempdir()
download_data <- function(url, dest) {
download.file(url = url, destfile = dest, mode = "wb")
}
# Define file paths
files <- list(
raw_counts = file.path(outputDir, "raw_counts.csv"),
raw_proteomics = file.path(outputDir, "raw_proteomics.csv"),
map_transcriptomics = file.path(outputDir, "map_transcriptomics.csv"),
map_proteomics = file.path(outputDir, "map_proteomics.csv"),
metadata_transcriptomics = file.path(outputDir, "metadata_transcriptomics.csv"),
metadata_proteomics = file.path(outputDir, "metadata_proteomics.csv"),
sample_metadata = file.path(outputDir, "sample_metadata.csv")
)
urls <- list(
raw_counts = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/raw_counts.csv",
raw_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/raw_proteomics.csv",
map_transcriptomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/map_transcriptomics.csv",
map_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/map_proteomics.csv",
metadata_transcriptomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/metadata_transcriptomics.csv",
metadata_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/metadata_proteomics.csv",
sample_metadata = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/sample_metadata.csv"
)
# Download files
Map(download_data, urls, files)
raw_counts<- read.csv(files$raw_counts,header=T, stringsAsFactors = F, row.names=1 , check.names = FALSE)
raw_proteomics<- read.csv(files$raw_proteomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
map_transcriptomics<- read.csv(files$map_transcriptomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
map_proteomics<- read.csv(files$map_proteomics,header=T, stringsAsFactors = F, row.names=1 ,check.names = FALSE)
metadata_transcriptomics<- read.csv(files$metadata_transcriptomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
metadata_proteomics<- read.csv(files$metadata_proteomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
sample_metadata<- read.csv(files$sample_metadata,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
Sanity checks
all(rownames(metadata_transcriptomics) == colnames(raw_counts))
## [1] TRUE
all(rownames(metadata_proteomics) == colnames(raw_proteomics))
## [1] TRUE
rawdata_rna is a data-frame of raw counts, with features
as rows and samples as columns
print(raw_counts[1:5,1:5])
## 19891053-MTEMP 19891053-SOM 19920001-MTEMP 19920001-SOM
## ENSG00000000003 144 100 417 374
## ENSG00000000005 1 0 3 0
## ENSG00000000419 241 155 204 217
## ENSG00000000457 279 203 326 338
## ENSG00000000460 94 82 169 180
## 19920194-MTEMP
## ENSG00000000003 201
## ENSG00000000005 1
## ENSG00000000419 294
## ENSG00000000457 363
## ENSG00000000460 120
rawdata_protein is a data-frame of raw protein
abundances, with features as rows and samples as columns
print(raw_proteomics[10:15,10:15])
## DPM13s19_SSC_P3WF10_001 20130400_MTG_P1WH9_001
## MAP1LC3B;MAP1LC3B2 76.7033 56.6484
## PGP 27.3285 24.4034
## BTBD17 NA NA
## WIPF3 NA 62.9174
## IGLON5 NA NA
## TUBAL3 8863.2900 7282.9700
## 20130400_SSC_P1WH10_001 A048s13_MTG_P2WE9_001
## MAP1LC3B;MAP1LC3B2 146.2390 117.9950
## PGP 19.7622 29.5640
## BTBD17 NA NA
## WIPF3 68.8211 NA
## IGLON5 10.2975 15.9789
## TUBAL3 8365.1800 8269.4000
## A048s13_SSC_P2WE10_001 A064s13_MTG_P2WF8_001
## MAP1LC3B;MAP1LC3B2 117.574 98.0735
## PGP NA 32.5456
## BTBD17 NA NA
## WIPF3 NA 85.5537
## IGLON5 NA 16.5443
## TUBAL3 13098.500 9600.5800
Maps are data frame containing two columns: primary and
colname. The primary column should be the
individual ID the individual metadata, and the colname the
matched sample names from the raw matrices columns. If sample and
individual ids are the same, maps aren’t needed (primary and colnames
are the same).
print(head(map_transcriptomics))
## primary colname
## 1 19891053-MTG 19891053-MTEMP
## 2 19891053-SSC 19891053-SOM
## 3 19920001-MTG 19920001-MTEMP
## 4 19920001-SSC 19920001-SOM
## 5 19920194-MTG 19920194-MTEMP
## 6 19920194-SSC 19920194-SOM
print(head(map_proteomics))
## primary colname
## 1 PDC016-MTG PDC016_MTG_P3WH10_001
## 2 PDC016-SSC PDC016_SSC_P4WA1_001
## 3 PDC026-MTG PDC026_MTG_P4WA3_001
## 4 PDC026-SSC PDC026_SSC_P4WA4_001
## 5 A019/12-MTG A019s12_MTG_P2WD10_001
## 6 A019/12-SSC A019s12_SSC_P2WE1_001
Technical metadata are data-frames contain the column
colname that should match the sample names in the raw
matrices, and any additional columns related to technical artefacts like
batches
print(head(metadata_transcriptomics))
## primary colname seq_batch
## 19891053-MTEMP 19891053-MTG 19891053-MTEMP IGFQ001167
## 19891053-SOM 19891053-SSC 19891053-SOM IGFQ001167
## 19920001-MTEMP 19920001-MTG 19920001-MTEMP IGFQ001167
## 19920001-SOM 19920001-SSC 19920001-SOM IGFQ001167
## 19920194-MTEMP 19920194-MTG 19920194-MTEMP IGFQ001167
## 19920194-SOM 19920194-SSC 19920194-SOM IGFQ001167
print(head(metadata_proteomics))
## sample_id sample_name batch
## PDC016_MTG_P3WH10_001 PDC016-MTG PDC016_MTG_P3WH10_001 P3
## PDC016_SSC_P4WA1_001 PDC016-SSC PDC016_SSC_P4WA1_001 P4
## PDC026_MTG_P4WA3_001 PDC026-MTG PDC026_MTG_P4WA3_001 P4
## PDC026_SSC_P4WA4_001 PDC026-SSC PDC026_SSC_P4WA4_001 P4
## A019s12_MTG_P2WD10_001 A019/12-MTG A019s12_MTG_P2WD10_001 P2
## A019s12_SSC_P2WE1_001 A019/12-SSC A019s12_SSC_P2WE1_001 P2
individual_metadata contains individual level metadata,
where one column matches the primary column in maps.
library(table1)
pvalue <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a standard 2-sample t-test
p <- t.test(y ~ g)$p.value
} else {
# For categorical variables, perform a chi-squared test of independence
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
sample_metadata$diagnosis=as.factor(sample_metadata$diagnosis)
sample_metadata$brain_region=as.factor(sample_metadata$brain_region)
table1(~ Braak + PMD + age + trem2 + amyloid + AT8 + PHF1 | diagnosis,
data=sample_metadata[sample_metadata$brain_region=="MTG",], overall=F, extra.col=list(`P-value`=pvalue))
| AD (N=33) |
Control (N=23) |
P-value | |
|---|---|---|---|
| Braak | |||
| 3 | 4 (12.1%) | 0 (0%) | <0.001 |
| 4 | 2 (6.1%) | 0 (0%) | |
| 5 | 9 (27.3%) | 0 (0%) | |
| 5TO6 | 6 (18.2%) | 0 (0%) | |
| 6 | 8 (24.2%) | 0 (0%) | |
| N/A | 4 (12.1%) | 0 (0%) | |
| 0 | 0 (0%) | 5 (21.7%) | |
| 1 | 0 (0%) | 4 (17.4%) | |
| 1TO2 | 0 (0%) | 2 (8.7%) | |
| 2 | 0 (0%) | 12 (52.2%) | |
| PMD | |||
| Mean (SD) | 36.4 (21.4) | 31.2 (16.9) | 0.315 |
| Median [Min, Max] | 30.0 [9.00, 84.0] | 25.0 [9.00, 88.0] | |
| age | |||
| Mean (SD) | 80.4 (10.9) | 80.5 (8.22) | 0.951 |
| Median [Min, Max] | 83.0 [43.0, 94.0] | 80.0 [65.0, 94.0] | |
| trem2 | |||
| CV | 12 (36.4%) | 18 (78.3%) | 0.0048 |
| TREM2 | 21 (63.6%) | 5 (21.7%) | |
| amyloid | |||
| Mean (SD) | 4.90 (4.02) | 0.764 (0.818) | <0.001 |
| Median [Min, Max] | 4.72 [0.0577, 13.0] | 0.585 [0.0456, 2.61] | |
| Missing | 17 (51.5%) | 14 (60.9%) | |
| AT8 | |||
| Mean (SD) | 1.66 (2.31) | 0.118 (0.304) | 0.018 |
| Median [Min, Max] | 0.688 [0.00615, 8.74] | 0.0140 [0.00223, 0.929] | |
| Missing | 17 (51.5%) | 14 (60.9%) | |
| PHF1 | |||
| Mean (SD) | 1.62 (1.21) | 0.0892 (0.222) | <0.001 |
| Median [Min, Max] | 1.84 [0.00839, 4.00] | 0.0105 [0.000519, 0.679] | |
| Missing | 17 (51.5%) | 14 (60.9%) |
To run Omix, we first need to generate a multi-omics object The package currently supports transcriptomics and proteomics bulk data only.
#Generate multiassay object
multiomics_regional=generate_multiassay(rawdata_rna =raw_counts,
rawdata_protein = raw_proteomics,
individual_to_sample=FALSE,
map_rna = map_transcriptomics,
map_protein = map_proteomics,
metadata_rna = metadata_transcriptomics,
metadata_protein = metadata_proteomics,
individual_metadata = sample_metadata,
map_by_column = 'sample_id',
rna_qc_data=FALSE,
rna_qc_data_matrix=NULL,
organism='human')
## ✔ Ensembl ID conversion to gene symbol
## ✔ Retrieval of gene biotype
## ✔ RNA raw data loaded
## class: SummarizedExperiment
## dim: 62656 112
## metadata(1): metadata
## assays(1): rna_raw
## rownames(62656): ENSG00000000003 ENSG00000000005 ... ENSG00000291316
## ENSG00000291317
## rowData names(3): ensembl_gene_id gene_name gene_biotype
## colnames(112): 19891053-MTEMP 19891053-SOM ... IGF122241 IGF122246
## colData names(3): primary colname seq_batch
## ✔ Protein raw data loaded
## class: SummarizedExperiment
## dim: 3228 112
## metadata(1): metadata
## assays(1): protein_raw
## rownames(3228):
## IGKV2-28;IGKV2-29;IGKV2-30;IGKV2-40;IGKV2D-26;IGKV2D-28;IGKV2D-29;IGKV2D-30;IGKV2D-40
## IGKV3-11;IGKV3D-11 ... FAM169A SEC23IP
## rowData names(1): gene_name
## colnames(112): PDC016_MTG_P3WH10_001 PDC016_SSC_P4WA1_001 ...
## 20196928_MTG_P2WD4_001 20196928_SSC_P2WD5_001
## colData names(3): sample_id sample_name batch
## ✔ MultiAssayExperiment object generated!
The MultiAssayExperiment object was succesfully created. The
following steps will process and perform QC on each omic layers of the
multiomics_object object.
multiomics_regional=process_rna(multiassay=multiomics_regional,
transformation='vst',
protein_coding=FALSE,
min_count = 10,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
covariates=c('age','sex','PMD'),
filter=TRUE,
batch_correction=TRUE,
batch="seq_batch",
remove_sample_outliers= FALSE)
## ✔ NORMALISATION & TRANSFORMATION
## ✔ GENE FILTERING
## ✔ QC parameters selected require genes to have at least 50 % of samples with counts of 10 or more
## ✔ 42425 / 62656 genes were dropped
## ✔ 20231 genes kept for analysis
## ✔ VST TRANSFORMATION
## ✔ BATCH CORRECTION
## ✔ Using Limma on seq_batch to remove technical artefacts, and age as biological confoundersUsing Limma on seq_batch to remove technical artefacts, and sex as biological confoundersUsing Limma on seq_batch to remove technical artefacts, and PMD as biological confounders
## ✔ Transcriptomics data processed!
## ✔ Processing parameters saved in metadata
multiomics_regional=process_protein(
multiassay=multiomics_regional,
filter=TRUE,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
imputation = 'minimum_value',
remove_feature_outliers= FALSE,
batch_correction= TRUE,
batch="batch",
correction_method="median_centering",
remove_sample_outliers=FALSE,
denoise=TRUE,
covariates=c('PMD','sex','age'))
## ✔ SCALING NORMALIZATION
## ✔ FILTERING
## ✔ 283 / 3228 proteins filtered
## ✔ 2945 proteins kept for analysis
## ✔ IMPUTATION
## ✔ Imputation of remaining missing values based on
## 50% of minimum level of abundance for each protein
## ✔ BATCH CORRECTION
## ✔ DENOISING BIOLOGICAL COVARIATES
## ✔ Using Limma on PMD as biological confoundersUsing Limma on sex as biological confoundersUsing Limma on age as biological confounders
Omix supports a range of vertical integration models:
Possible integration methods are
MOFA,DIABLO,sMBPLS,iCluster,MEIFESTO
The choice of the integration models depends on the research use case of interest.
Here we display the use of Omix on a popular integration method, MOFA or Multi-Omics Factor analysis (Argelaguet et al. 2018).
In this workflow, we proceed with a multi-omics integration of 56 brain Alzheimer’s disease (AD) and Control samples coming from two brain regions
- The somatosensory cortext (SOM)
- The middle frontal gyrus (MTG)
The MTG is known to be affected earlier during AD progression, while SOM at later stages. Using these two regions as pseudotemporal proxi in the integrative process, we are able to gain a deeper understanding of biological mechanisms that occur during AD progression.
multiomics_regional=vertical_integration(multiassay=multiomics_regional,
slots = c(
"rna_processed",
"protein_processed"
),
integration='MOFA',
ID_type = "gene_name",
dependent='diagnosis',
intersect_genes = FALSE,
num_factors = 15,
scale_views = TRUE,
most_variable_feature=TRUE)
Since package version slightly affect model outputs, we reload a pre-trained model.
multiomics_regional <- readRDS("~/MSD/multiomics_regional_pretrained.rds") # load pre-trained model
Omix provides a range of built-in downstream
analyses functions and visualisations. All downstream analyses will be
performed on the integrated object stored in the integrated
slot.
integrated_object=multiomics_regional@metadata$integration$MOFA
metadata=integrated_object@samples_metadata
plot=correlation_heatmap(integrated_object,
covariates=c("age","PMD",'AD','MTG','SOM',
'amyloid','AT8','PHF1',
'Early','Mid','Late'))
MOFA2::plot_variance_explained(integrated_object, max_r2=5)
Factors 14 is chosen downstream analysis:
Strongly correlated with neuropathological variables including PHF1 and amyloid.
Some level of shared variance across modalities
Factor 1, 7 and 11 are also chosen for downstream analysis in the pre-print, though are not shown in this workflow.
Weights vary from -1 to +1, and provide a score for how strong each feature relates to each factor, hence allowing a biological interpretation of the latent factors. Features with no association with the factor have values close to zero, while genes with strong association with the factor have large absolute values.
The sign of the weight indicates the direction of the effect:
A positive weight indicates that the feature has higher levels in the samples with positive factor values.
A negative weight indicate higher levels in samples with negative factor values.
The extract_weigthsfunction enable to extract the
weights on the desired factor at a defined absolute threshold (1.5 SD
from the mean), and return an object with positive and negative weights
above and below this threshold respectively. It also returns QC
plots
A distribution of the feature weights for each omic layer at the designated factor
The relationship between the
feature/sense_check_variable correlation and weights. High
weights should coincide with stronger correlation if the
sense_check_variable is an important driver of variation in
the designated factor.
weights14=extract_weigths(integrated_object,
factor=14,
threshold=1.5,
sense_check_variable='PHF1')
weights14$distribution_plot$rna
weights14$distribution_plot$protein
weights14$weights_cor_plot$rna
weights14$weights_cor_plot$protein
A negative weight indicates that the feature has higher levels in the samples with negative factor values, which in this analysis coincides with later Braak stages.
integrated_object@samples_metadata$Braak=as.factor(integrated_object@samples_metadata$Braak)
MOFA2::plot_factor(integrated_object,
factors = 14,
color_by = "Braak",
add_violin = TRUE,
dodge = TRUE
)
Based on the features (genes and proteins) selected to be more than 1.5 SD away from the mean, we build a co-expression network of genes and proteins. An edge is drawn between features that have a correlation => 0.3 (moderate correlation)
Weights14_up=multiomics_network(multiassay=multiomics_regional,
list=weights14$weights$ranked_weights_positive,
correlation_threshold =0.3,
filter_string_50= FALSE)
The next step of the analysis is to find densily co-expressed communities of proteins and RNAs within the multi-omics network.
This function tries to find densely connected subgraphs in a graph by calculating the leading non-negative eigenvector of the modularity matrix of the graph.
communities14 <- communities_network(igraph=Weights14_up$graph,
community_detection='leading_eigen')
plot_communities(igraph=Weights14_up$graph,communities14$community_object)
This table summaries the memberships of the four detected modules.
Here we proceed with a EWCE analysis to check if modules
are enriched in specific cell types.
com=lapply(communities14$communities,function(x){sub("\\_.*", "",x )})
com=lapply(com,function(x){sub("\\..*", "",x )})
com=lapply(com,function(x) {x[!duplicated(x)]})
cell_type14 <- cell_type_enrichment(
multiassay =multiomics_regional,
communities = com,
ctd= ctd
)
cell_type14$plots
Let’s look at the features in more details:
community_1=community_graph(igraph=Weights14_up$graph,
community_object=communities14$community_object,
community=1)
interactive_network(igraph=community_1$graph,communities=FALSE)
print((sort(community_1$hubs, decreasing = TRUE)))
## LAPTM5_rna SYK_rna ITGB2_rna LAIR1_rna
## 1.000000000 0.982149699 0.972346799 0.971721165
## PIK3AP1_rna CSF1R_rna VSIG4_rna SASH3_rna
## 0.960183008 0.953006971 0.952624008 0.951664689
## C1QC_rna ADORA3_rna C3AR1_rna HCLS1_rna
## 0.949014019 0.944848796 0.944807533 0.942727577
## SLA_rna TLR7_rna C1QA_rna CD74_rna
## 0.937214252 0.933653051 0.929875015 0.928283120
## CD68_rna CD84_rna MS4A6A_rna MSR1_rna
## 0.927532397 0.924678571 0.921285461 0.920621022
## TLR2_rna SLC1A5_rna SCIN_rna HLA-DRA_rna
## 0.908127394 0.906895335 0.902956379 0.902213789
## C3_rna SLC2A5_rna TREM2_rna LCP1_rna
## 0.902069624 0.901973392 0.901925764 0.900337973
## CLEC7A_rna PTPRC_rna ALOX5AP_rna HLA-DMB_rna
## 0.886166848 0.885844178 0.879270418 0.873682010
## C1QB_rna PARVG_rna SIGLEC8_rna MS4A7_rna
## 0.871849862 0.866844684 0.865898504 0.863306476
## TMEM119_rna ARHGAP45_rna CCR1_rna CD14_rna
## 0.858300396 0.846387610 0.843056253 0.835487918
## RUNX1_rna ARHGAP30_rna CD163_rna SIGLEC10_rna
## 0.831813982 0.825698973 0.822274667 0.819929730
## KLHL6_rna TFEC_rna SLAMF8_rna SPN_rna
## 0.813149180 0.800551700 0.785737487 0.783688227
## FPR3_rna LPCAT2_rna ADAM28_rna CSF3R_rna
## 0.776793900 0.772304444 0.763707387 0.760866815
## TMC8_rna FCGBP_rna CYBB_rna FCGR1A_rna
## 0.749254280 0.748944887 0.746216938 0.743329331
## PIK3R5_rna SELPLG_rna CD53_rna COL8A2_rna
## 0.743063442 0.730084497 0.721317129 0.718399474
## GPR34_rna CLEC5A_rna ENSG00000287684_rna ALOX15B_rna
## 0.655585949 0.655576738 0.644725355 0.625379303
## ENSG00000288156_rna CAPG_rna APOBR_rna S100A4_rna
## 0.616952019 0.569917697 0.556936344 0.555176207
## SIGLEC1_rna LINC01094_rna GEM_rna CXCR4_rna
## 0.523269236 0.518021686 0.482475854 0.458833513
## CX3CR1_rna IL1R2_rna AQP9_rna P2RY12_rna
## 0.452828319 0.391044304 0.319762213 0.301242760
## NAMPT_protein GPNMB_rna RP11-160E2.6_rna AK4_protein
## 0.297996842 0.168507730 0.132732212 0.130166897
## SNORD17_rna NEB_rna VPS26A_protein MOXD1_rna
## 0.057233309 0.054447384 0.023175182 0.014634177
## CH507-24F1.2_rna
## 0.005796173
community_2=community_graph(igraph=Weights14_up$graph,
community_object=communities14$community_object,
community=2)
interactive_network(igraph=community_2$graph,communities=FALSE)
functional_enrichment_14=list()
functional_enrichment_14=lapply(communities14$communities, function(x){
communities_l = sub("\\_.*", "",x)
pathway_analysis_enrichr(communities_l,plot=20)
})
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
Here are the biological mechanisms for the microglial module
Here are the biological mechanisms for the astrocyte module
multiomics_modules compute the module eigen value (PC1
of scaled transcriptomics/proteomics expression) and correlates it to
chosen covariates
modules_F14_eigenvalues=multiomics_modules(multiassay=multiomics_regional,
metadata=MOFA2::samples_metadata(integrated_object),
covariates=c('PHF1','amyloid'),
communities=communities14$communities,
filter_string_50=TRUE)
The slot modules_F14_eigenvalues$modules_eigen_value
stores the module eigenvalues per sample, which can be correlated to any
covariates of interest.
These steps were repeated for Factor 1, Factor 7, anf Factor 11, all significantly associated to AD in order to derive a comprehensive picture of mechanisms at play. These are summarised in the figure below:
Bulk proteomics and transcriptomics data from the ROSMAP cohort was collected from Synapse and processed the same way as step I.
In this part, we focus on early AD Braak III-IV donors (n=132).
multimodal <- readRDS("~/MSD/multimodal.rds")
multimodal$rna_processed=data.frame(t(multimodal$rna_processed))
multimodal$protein_processed=data.frame(t(multimodal$protein_processed))
iCluster is an integrative clustering framework designed to jointly analyze multi-omics datasets to uncover shared molecular patterns.By leveraging Bayesian latent variable models, iCluster identifies subgroups based on correlated features across multiple data types, here proteomics and transcriptomics.
int_clust=integrate_with_iCluster(multimodal,
try.N.clust = 2:4)
# m1=clustering_redo$multimodal_object$rna_processed
# m2=clustering_redo$multimodal_object$protein_processed
#
#
# model <- iClusterPlus::iClusterBayes(
# dt1 = t(m1),
# dt2 = t(m2),
# dt3 = NULL, dt4 = NULL, dt5 = NULL, dt6 = NULL,
# type = c("gaussian", "gaussian"),
# K =1,
# n.burnin=18000,
# n.draw=12000,prior.gamma=c(0.5,0.5),sdev=0.05,thin=3
# )
int_clust <- readRDS("~/MSD/int_clust.rds") # load pre-trained model
The optimal number of cluster (i.e. molecular subtypes) is 2.
Posterior probabilities in
int_clust[["model"]][["beta.pp"]] indicate features driving
the cluster assignment.
Let’s add the molecular subtype annotation to the metadata
cog_metadata <- readRDS("~/MSD/cog_metadata.rds")
cog_metadata$cluster=int_clust$model$clusters
First, we check for survivor bias before Kaplan-Meier analysis to ensure that differences in survival curves are not confounded by unequal follow-up durations or dropout rates across groups, which could skew the results.
keep=cog_metadata$years<=12
ggpubr::ggscatter(cog_metadata,
y = "years", x = "age_death",
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "pearson", label.sep = "\n")
)+ viridis::scale_color_viridis()+ylab("Number of follow-up years")+xlab("Age at death")
ggpubr::ggscatter(cog_metadata[keep,],
y = "years", x = "age_death",
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "pearson", label.sep = "\n")
)+ viridis::scale_color_viridis()+ylab("Number of follow-up years")+xlab("Age at death")
Based on these results, individuals with a follow-up equal or less than 12 years will be kept for Kaplan Meier analysis, and the rest will be filtered out.
Kaplan-Meier analysis to assess delay in AD onset between molecular subgroups
cog_metadata <- readRDS("~/MSD/cog_metadata.rds")
cog_metadata$time = ifelse(cog_metadata$turn==0,cog_metadata$age_death ,cog_metadata$age_bl+cog_metadata$year_onset)
# turn variable is 0 if individual never turns AD, 1 if it turns AD
#Using age as the time variable for Kaplan-Meier analysis is appropriate when the event of interest (here onset of Alzheimer's Disease) is closely tied to an individual's chronological age.
cog_metadata$time=ifelse(is.na(cog_metadata$time ),cog_metadata$age_death ,cog_metadata$time)
## for survivor bias
keep=cog_metadata$years<=12
cfit3 <-survfit(Surv(time, turn) ~ cluster , data = cog_metadata,subset = keep )
ggsurvplot(cfit3,
pval = TRUE, conf.int = F,
xlim = c(75,105),
break.x.by = 5,
pval.coord = c(75,0.2),
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_classic(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"),
ylab="Remain without AD",xlab="Age of AD onset")
Kaplan-Meier survival analysis revealed that one molecular subgroup
had a significantly faster prodromal progression to dementia compared to
the other molecular subgroup. Cluster 2 is renamed as the
fast progressing cluster
cog_metadata$cluster_group=ifelse(cog_metadata$cluster==2,"Fast","Slow")
We project all the modules found to be signficantly associated with AD to the ROSMAP cohort. These were selected using differential eigenvalue expression which is detailed in the pre-print.
modules_list_omic <- readRDS("~/MSD/modules_list_omic.rds")
sig_modules <- readRDS("~/MSD/sig_modules.rds")
modules_list_omic =modules_list_omic[sig_modules]
# Create a new list to store only the 'Protein' vectors
protein_list <- lapply(modules_list_omic, function(x) x$Protein)
rna_list<- lapply(modules_list_omic, function(x) x$RNA)
rosmap_RNA=data.frame(t(multimodal$rna_processed))
rosmap_prot=data.frame(t(multimodal$protein_processed))
list_eigen=list()
for(module in sig_modules){
# Recalculate MEs with color labels
rna_f=rna_list[[module]]$feature
rna_f=rna_f[which(rna_f %in% colnames(rosmap_RNA))]
rna=rosmap_RNA[,rna_f]
prot_f=protein_list[[module]]$feature
prot_f=prot_f[which(prot_f %in% colnames(rosmap_prot))]
prot=rosmap_prot[,prot_f]
df=cbind(rna,prot)
df_s=scale(df)
moduleColors=rep(module,ncol(df))
moduleColors=setNames( moduleColors,colnames(df))
list_eigen[[module]] <-data.frame(individualID=rownames(df_s),WGCNA::moduleEigengenes(df_s,
moduleColors)$eigengenes)
}
We can now add the module eigenvalue to the ROSMAP clinical metadata
df <- Reduce(function(x, y) merge(x, y, by = "individualID", all = TRUE), list_eigen)
df_long <- df %>%
pivot_longer(cols = -individualID, names_to = "variable", values_to = "value")
metadata_long= merge(cog_metadata,df_long, by="individualID")
metadata_long$variable <- sub("ME", "", metadata_long$variable)
ggplot(metadata_long, aes(x = cluster_group, y = value, fill = cluster_group)) +
geom_boxplot(alpha = 0.4, aes(fill = cluster_group), outlier.shape = NA) +
geom_jitter(aes(color = cluster_group), alpha = 0.4) +
stat_compare_means(label = "p.signif", ref = "Slow", method = "wilcox") +
facet_wrap(~variable, nrow = 4) + # Set the number of rows for facets
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = c("#2E9FDF", "#E7B800")) +
scale_color_manual(values = c("#2E9FDF", "#E7B800"))
There are singificant differences in module eigenvalue between the
fast and slow subtypes, particularly for the microglia
F14_UP_ME1 and astrocyte F14_UP_ME2
modules.
TREM2 agonists, such as AL002 developed by Alector, in phase 2 clinical trial (NCT04592874), exploit this protective role of plaque compaction in early AD, targeting individuals in early AD with mild cognitive impairment (MMSE ≥20).
On November 25th, the trial results showed that the TREM2 agonist was not effective in in slowing disease progression in individuals with early AD.
Conversely, our data indicated that strategies to instead downregulate microglial responses may be a better strategy to prevent further damage in mid-pathology stages (Braak III-IV), at a stage where microglial overactivation most likely exacerbates neuroinflammation.
cog_metadata$alector= ifelse(cog_metadata$mmse>=20,"Early mmse >=20","Later mmse<20")
cog_metadata= merge(cog_metadata,df, by="individualID")
ggplot(cog_metadata, aes(x = cluster_group, y = MEF14_UP_ME1, fill = cluster_group)) +
geom_boxplot(alpha = 0.4, aes(fill = cluster_group), outlier.shape = NA) +
geom_jitter(aes(color = cluster_group), alpha = 0.4) +
stat_compare_means(label = "p.signif", ref="Slow", method = "wilcox") +
facet_wrap(~alector) +scale_fill_manual(values=c( "#2E9FDF","#E7B800"))+scale_color_manual(values=c( "#2E9FDF","#E7B800"))+ylab("F14_UP_ME1 - microglia module")
The microglial module F14_UP_ME1, characteristic of
microglial activation (pathways) and containing TREM2 shows elevated in
fast progressing patients with mild cognitive impairment (MMSE ≥20),
which suggest that reducing these levels might slow Alzheimer’s disease
progression.
Conclusion:
This workflow demonstrates how Omix facilitates end-to-end analysis of multi-omics data, uncovering glial mechanisms in AD progression and providing insights into therapeutic strategies.
Part I - Transcriptomics-Proteomics Integration in the Discovery Cohort
The integration of transcriptomics and proteomics data using Omix successfully identified biologically meaningful multi-omics modules in Alzheimer’s disease. By leveraging MOFA for pseudotemporal multi-omics integration, we uncovered distinct biological mechanisms linked to cognitive decline and neuropathological markers. This step highlights the power of Omix in performing data preprocessing, normalisation, and integration to explore complex relationships between molecular data modalities.
Part II - Molecular Subtyping in the ROSMAP Cohort
Multi-omics clustering applied to the ROSMAP cohort stratified individuals into two molecular subgroups with distinct progression rates. Kaplan-Meier survival analysis revealed that one molecular subgroup had a significantly faster prodromal progression to dementia compared to the other molecular subgroup. These findings emphasise the heterogeneity of Alzheimer’s disease and suggest that molecular subtyping can provide clinically relevant insights into disease trajectory.
Part III - Projecting Modules and Assessing Differences
The projection of discovery cohort-derived modules to the ROSMAP cohort demonstrated robust module reproducibility and highlighted differential module activity between the fast and slow subgroups. Microglial- and astrocytic-enriched modules showed significant variability in their expression profiles, aligning with known biological differences between the two subgroups. These results suggests that different underlying molecular processes may explain a part of between-patient variability in decline.
Part IV - Relevance for Alector’s TREM2 agonist clinical trial stratification
Our findings provide a molecular explanation for the failure of the TREM2 agonist trial in slowing Alzheimer’s progression in early stages. While microglial responses may be protective in early pathology, our data suggest that downregulating microglial overactivation could be more effective in mid-pathology stages. This underscores the importance of stratified therapeutic approaches tailored to the disease stage and molecular subtype, offering a new direction for future clinical trials.